Skip to content

Conversation

@lovedheart
Copy link

./build/bin/Release/test-backend-ops.exe perf -o MUL_MAT -p type_a=iq1_m

Tested on AMD 8845HS 780M iGPU

n PR: μs/run PR: GFLOPS Main: μs/run Main: GFLOPS Speedup vs Main
1 224.28 523.63 282.44 415.80 1.26x
2 310.53 756.38 385.04 610.01 1.24x
3 408.65 862.15 515.79 683.08 1.26x
4 589.40 797.02 1244.08 377.60 2.11x
5 1075.96 545.75 4427.85 132.62 4.11x
8 2576.61 364.64 4985.43 188.45 1.94x
512 11601.05 5180.00 11948.15 5030.00 1.03x

@lovedheart lovedheart requested a review from 0cc4m as a code owner November 1, 2025 00:03
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 1, 2025
return;

// Number of rows to process for this workgroup
const uint rows_to_process = min(NUM_ROWS, p.stride_d - first_row);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty surprised if it helped to make the changes in this function - this will prevent the compiler from unrolling loops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants